4.5 Multiple Hypotheses Testing

in practice, we often have more than two models that we like to compare based on their estimated generalization performance

（ここまでは2つのモデルをMcNemar検定で比較したが）「実際は、見積もった汎化性能に基づき3つ以上のモデルを比較したいことがしばしばある」

applying the testing procedure described earlier multiple times will result in a typical issue called "multiple hypotheses testing."

「先述した検定手順を複数回適用することは、複数仮説検定（multiple hypotheses testing）と呼ばれる典型的な問題を生じさせる」

TODO：複数仮説検定は3 Cross-validation and Hyperparameter Optimizationで短く論じているらしい（どこのことだろう？）

複数仮説検定に対処するアプローチ（手順）

1. 複数モデルの分類accuracyに違いはないという帰無仮説のもとでオムニバス検定を実施する

2. オムニバス検定により帰無仮説が棄却されたら、複数回の比較について補正して、ペア単位で事後検定を実施する

例えばMcNemar検定も使われる

オムニバス検定は、ランダムサンプルは帰無仮説から外れているかどうかを確認するために設計された統計的な検定

例：Analysis of Variance (ANOVA)

複数のグループの平均が等しいという帰無仮説の検定に使われる

To compare multiple machine learning models, Cochran’s Q test would be a possible choice,

「複数の機械学習モデルを比較するためには、CochranのQ検定が実現可能な選択肢だろう」

CochranのQ検定は本質的には3つ以上のモデルへの一般化したバージョンのMcNemar検定

omnibus tests such as Cochran’s Q only tell us that a group of models differs or not.

「オムニバス検定はモデルのグループが異なるか否かを伝えるだけ」

we may conclude that there is at least one significant difference among the different models.

（オムニバス検定により帰無仮説が棄却されたら）「異なるモデル間に少なくとも1つの有意差があると結論できるかもしれない」

事後検定（post hoc testing）

it is required to compare all possible pairs of models with each other

「モデルの可能なペアすべてを比較する必要がある」

複数仮説検定（multiple hypotheses testing）問題が現れる

However, please keep in mind that these are all approximations and everything concerning statistical tests and reusing test sets (independence violation) should be taken with (at least) a grain of salt.

「これらは全て近似であり、全て統計的検定に関係する」

「（独立性を侵害して）テストセットを繰り返し使うことは（少なくとも）話半分に聞かれるべきである」

https://eow.alc.co.jp/search?q=take+~+with+a+grain+of+salt まゆつば

（感想：1つのテストセットを繰り返し使うことは独立性を侵害しているので、結論で反復にカッコが付いていると思われる）

using the Bonferroni correction is a means to reduce the false positive rate in multiple comparison tests

「Bonferroni’s correctionを使うことは、複数回比較する検定における偽陽性率を減らす手段である」

偽陽性＝ここでは違いがないのに帰無仮説を棄却（有意差があると結論）